Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: CoJaBo Date: Aug 7, 2012 4:03pm
Forum: web Subject: Re: Domainsponsor.com erasing prior archived copies of 135,000+^W 24 million+ domains

Disallowing the user agent "*" will not cause removal; it will stop crawling as expected, but only the specific user-agent specified to on the removal FAQ page will actually cause removal of past content.

I do agree that there should be something explicitly stating "Remove" in the directive to prevent such a mistake; others off-site have pointed out that some "bad bot blacklists" also include these lines without explanation of what "ia_archiver" actually is- its possible DomainSponsor got it from a source like that and didn't realize it would cause *removal* of the content from the Archive as it would have been far separated from the removal FAQ entry at that stage.

If anyone's been following the list of sites registered to their nameserver (that is, sites being removed from the Archive in this way), its increased nearly two-hundred-fold since I made this post; the current count is over 24 *million*.
I'm not sure if this indicates they are expanding that rapidly or simply that that particular index site is just catching up with their existing registrations; I suspect the latter to be more likely.

Reply to this post
Reply [edit]

Poster: Jeremy Leader Date: Aug 7, 2012 4:37pm
Forum: web Subject: Re: Domainsponsor.com erasing prior archived copies of 135,000+^W 24 million+ domains

OK, CoJaBo, thanks for that clarification.

So there's no way to say "Internet Archive, don't crawl my site, but don't delete the archive", while still allowing other crawlers to crawl the site?

Reply to this post
Reply [edit]

Poster: CoJaBo Date: Aug 7, 2012 4:40pm
Forum: web Subject: Re: Domainsponsor.com erasing prior archived copies of 135,000+^W 24 million+ domains

It doesn't seem so; the FAQ only mentions those lines for removal, it doesn't seem to give an option for "don't crawl the site anymore, but still keep the existing content".

I had hoped someone from either DomainSponsor or the Archive would have responded to my emails by now.